Grammatical annotation of the Portuguese C-ORAL Corpus
نویسنده
چکیده
منابع مشابه
Grammatical Annotation of Historical Portuguese: Generating a Corpus-Based Diachronic Dictionary
In this paper, we present an automatic system for the morphosyntactic annotation and lexicographical evaluation of historical Portuguese corpora. Using rule-based orthographical normalization, we were able to apply a standard parser (PALAVRAS) to historical data (Colonia corpus) and to achieve accurate annotation for both POS and syntax. By aligning original and standardized word forms, our met...
متن کاملThe C-ORAL-BRASIL I: Reference Corpus for Spoken Brazilian Portuguese
C-ORAL-BRASIL I is a Brazilian Portuguese spontaneous speech corpus compiled following the same architecture adopted by the C-ORAL-ROM resource. The main goal is the documentation of the diaphasic and diastratic variations in Brazilian Portuguese. The diatopic variety represented is that of the metropolitan area of Belo Horizonte, capital city of Minas Gerais. Even though it was not a primary g...
متن کاملThe annotation of the C-ORAL-BRASIL spoken corpus using an adaptation of the Palavras Parser
This article describes the morphosyntactic annotation of the C-ORAL-BRASIL speech corpus, using an adapted version of the Palavras parser. In order to achieve compatibility with annotation rules designed for standard written Portuguese, transcribed words were orthographically normalized, and the parsing lexicon augmented with speech-specific material, phonetically spelled abbreviations etc. Usi...
متن کاملChallenges in modality annotation in a Brazilian Portuguese Spontaneous Speech Corpus
This short paper introduces the first notes about a modality annotation system that is under development for a spontaneous speech Brazilian Portuguese corpus (C-ORALBRASIL). We indicate our methodological decisions, the points which seem to be well resolved and two issues for further discussion and investigation.
متن کاملWhen CORDIAL Becomes Friendly: Endowing the CORDIAL Corpus with a Syntactic Annotation Layer
This paper reports on the syntactic annotation of a previously compiled and tagged corpus of European Portuguese (EP) dialects – The Syntax-oriented Corpus of Portuguese Dialects (CORDIAL-SIN). The parsed version of CORDIAL-SIN is intended to be a more efficient resource for the purpose of studying dialect syntax by allowing automated searches for various syntactic constructions of interest. To...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011